Cultivating Trees: Adding Several Semantic Layers to the Lassy Treebank in SoNaR

نویسندگان

  • Ineke Schuurman
  • Veronique Hoste
  • Paola Monachesi
چکیده

Within the STEVIN1 project Large Scale Syntactic Annotation of written Dutch (LASSY), a manually corrected treebank of 1 million words is constructed. Lassy is part of a series of annotation projects for modern written and spoken Dutch. More specifically, it is an extension of the D-Coi and CGN projects,2 and constitutes the core of SoNaR, a 500 million words reference corpus of modern written Dutch.3 One of the goals of the latter project is to enrich the corrected treebank produced in Lassy4 with several semantic layers. For a general overview of the relations between D-Coi, Lassy and SoNaR, cf [19]. In this paper we will concentrate on the semantic layers of SoNaR core: (1) named entity labeling, (2) annotation of co-reference relations, (3) semantic role labeling and (4) annotation of spatial and temporal relations. Of these (2) originates from the STEVIN-project COREA,5 (3) and (4) from D-Coi, whereas (1) is a new area within STEVIN.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Example-Based Treebank Querying

The recent construction of large linguistic treebanks for spoken and written Dutch (e.g. CGN, LASSY, Alpino) has created new and exciting opportunities for the empirical investigation of Dutch syntax and semantics. However, the exploitation of those treebanks requires knowledge of specific data structures and query languages such as XPath. Linguists who are unfamiliar with formal languages are ...

متن کامل

Using corpora tools to analyze gradable nouns in Dutch

In this paper, we expand Morzycki (2009)’s claims that degree readings of size adjectives are attributed to syntax. We introduce a corpus-based analysis in Dutch to verify and extend his claim into the semantic domain. Using the LASSY Treebank, we extract syntactic and semantic properties of noun phrases consisting of the adjectives “gigantisch”, “kolossaal”, and “reusachtig” and manually annot...

متن کامل

Large Scale Syntactic Annotation of Written Dutch: Lassy

The construction of a 500-million-word reference corpus of written Dutch has been identified as one of the priorities in the STEVIN programme. The focus is on written language in order to complement the Spoken Dutch Corpus (CGN) [13], completed in 2003. In D-COI (a pilot project funded by STEVIN), a 50-million-word pilot corpus has been compiled, parts of which were enriched with verified synta...

متن کامل

Increasing Return on Annotation Investment: The Automatic Construction of a Universal Dependency Treebank for Dutch

We present a method for automatically converting the Dutch Lassy Small treebank, a phrasal dependency treebank, to UD. All of the information required to produce accurate UD annotation appears to be available in the underlying annotation. However, we also note that the close connection between POS-tags and dependency labels that is present in UD is missing in the Lassy treebanks. As a consequen...

متن کامل

Scaling-up RAAMs

Modi cations to Recursive Auto-Associative Memory are presented, which allow it to store deeper and more complex data structures than previously reported. These modi cations include adding extra layers to the compressor and reconstructor networks, employing integer rather than real-valued representations, pre-conditioning the weights and pre-setting the representations to be compatible with the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008